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Abstract — We propose a method for the Distributed Assessment 
of Network CEntrality (DANCE) in complex networks. DANCE 
attributes to each node a volume-based centrality computed using 
only localized information, thus not requiring knowledge of the 
full network topology. We show DANCE is simple, yet efficient, 
in assessing node centrality in a distributed way. Our proposal 
also provides a way for locating the most central nodes, again 
using only the localized information at each node. We also show 
that the node rankings based on DANCE's centrality and the 
traditional closeness centrality correlate very well. This is quite 
useful given the vast potential applicability of closeness centrality, 
which is however limited by its high computational costs. We 
experimentally evaluate DANCE against a state-of-the-art pro- 
posal to distributively assess network centrality. Results attest 
that DANCE achieves similar effectiveness in assessing network 
centrality, but with a significant reduction in the associated costs 
for practical applicability. In contrast to previous work, this 
outcome allows DANCE to be applied to large-scale networks. 

I. Introduction 

The concept of network centrality is an important tool to 
analyze complex networks fTl-|3|. In broad terms, network 
centrality measures the relative importance of nodes in a 
complex network. Different ways of measuring centrality have 
been proposed for decades P^-fT^, each of them suited to 
assess node centrality from a different point of view. Examples 
include using network centrality to evaluate network robust- 
ness to fragmentation or to identify the most important nodes 
to efficient information spreading in diffusion networks. 

As the definitions for centrality vary, so varies the difficulty 
in computing centrality, ranging from low cost (e.g., degree 
centrality) to others far more costly, such as betweenness 
and closeness centralities. The later two, even though very 
useful, are costly because they rely on the determination 
of the shortest path between all pairs of nodes, thus also 
requiring full knowledge of the network topology. A high 
computational cost and the requirement of full knowledge of 
network topology becomes a significant obstacle for applying 
the general concept of network centrality to the large-scale 
complex communication networks we face nowadays, such as 
the Internet routing structure, online social networks, P2P net- 
works, and content distribution networks. Hence, research has 
been recently dedicated to finding new ways of dealing with 
network centralities in large-scale complex networks (Sec- 
tion IV reviews related work). Typically, these recent efforts 



either (i) optimize the way by which traditional centralities are 
calculated or approximated |8|-|10J; or (ii) propose methods 
to assess network centrality in a distributed way without 
requiring full knowledge of the network topology pTj-|[T3). 



In this paper, we propose DANCE (Distributed Assessment 
of Network CEntrality), a distributed method to assess network 
centrality and to locate nodes with high centrality based only 
on localized information restricted to a given subgraph around 
each node. As a first contribution, DANCE computes centrality 
in a fully distributed way without requiring knowledge about 
the full topology of the network. A second contribution is to 
provide an algorithm for locating the node with the highest 
centrality in the network, characterized by a global maximum 
of centrality, and also the other relevant central nodes charac- 
terized by local maxima in centrality. This is also performed 
in a distributed way using the same limited neighborhoods 
adopted for assessing the network centrality. 

We show DANCE is simple, yet efficient, in distributively 
assessing network centrality. This conclusion results from a 
thorough evaluation of DANCE using both synthetically gen- 
erated networks and traces of real-world networks of different 
kinds and scales. In our evaluation, we directly compare 
DANCE with SOC (Second Order Centi-ality), a state-of-the- 
art proposal for assessing network centrality in a distributed 
way by Kermarrec et al. |13|. Compai-ed to SOC, DANCE 
provides similar results in assessing network centrality in a 
distributed way, however with message complexity one order 
of magnitude lower than SOC and further reduced applicability 
costs, thus rendering DANCE more suitable for practical 
application in the current large-scale complex networks. 

Another key outcome is that the node ranking using the 
network centrality computed by DANCE is highly correlated 
with the node ranking using the traditional closeness centrality, 
which requires high computational costs and full knowledge 
of network topology. This outcome is quite useful given the 
vast potential applicability of closeness centrality, which is 
seldom applied to large-scale complex networks due to its 
high computational costs even if the full network topology is 
known. We also investigate the trade-off between accuracy in 
such a correlation and DANCE costs as a function of the extent 
of the neighborhood around each node DANCE considers to 
compute the network centrality in a distributed way. DANCE 
thus contributes with a simple and efficient alternative to assess 
network centrality in large-scale networks, as we show with 
practical examples. 

This paper proceeds as follows. Section [ll] introduces 
DANCE. Section [III] presents results obtained from applying 
DANCE to a diverse set of synthetic and real-world networks. 
We describe related work in Section IV Finally, Section |V] 
concludes the paper and discusses future work. 
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II. DANCE 

This section describes how DANCE assesses network cen- 
traHty in a distributed way and locates high centraHty nodes. 

A. Distributed assessment of network centrality 

We consider a network as equivalent to an undirected simple 
graph. The distance between two nodes in the network is 
defined as the number of hops h in the shortest path between 
these nodes. We define the h_neighborhood of a given node n 
as the subgraph containing nodes with distance to node n less 
or equal to h. We refer to this hjneighborhood of each node n 
as h_neigh{n). Therefore, for instance, the 0_neigh{n) is 
the subgraph containing just the node n. The l_neigh{n) 
is the subgraph containing the node n and all its direct 
neighbors while the 2_neigh{n) is the set containing node n, 
its neighbors, and all neighbors of its neighbors. 

In DANCE, the centrality value of each node in the network 
is defined as the volume of the h_neighborhood of that node. 
The volume v{h_neigh{n)) of a h_neighborhood is thus 



Algorithm 1 discover_h_Neighborhood(n) 



v{h_neigh{n)) = di 



(1) 



i^h_neigh{n) 



where dn is the degree of node n. Note that this also includes 
all edges that connect nodes in hjneighborhood to nodes 
outside it. Figure [T] illustrates the h_neighborhoods for the 
black node considering different h values. For instance, the 
2_neigh{n) of the black node n in Figure [T|is composed of 14 
nodes with a total volume of 53. Clearly, {h + 1) _neigh{n) 3 
h_neigh(n) and v{(h + l)_neigh{n)) > v{h_neigh{n)). 




Fig. 1. Illustration of the hjneighborhood in DANCE. 

To compute the volume-based centrality values for all the 
nodes in the network, we choose a value for h and then find 
the volume of the hjneighborhood for each node. Clearly, 
the obtained result is directly impacted by the choice of h. 
With — 0, this localized centrality becomes the traditional 
degree centrality. With /i > 0, each node needs to discover its 
own h_neighborhoods along with the degree of each node 
belonging to it. To achieve this. Algorithm [T] is run at each 
node. Each node sends its identity and degree to each of 
its neighbors in a message with time-to-live (TTL) equal to 
h. This message also carries a unique message id {e.g., the 
originator node id plus a time stamp) in order to prevent 
retransmissions of repeated messages. Upon receiving such a 



h_neighborhood -f— EMPTY 
localT opologyChanged TRUE 
while TRUE do 

waitFor received AI SG OR localTopologyChanged 
if receivedM SG then 

it receivedMSG is NEW then 

mark receivedM SG as not NEW 
update h_neighborhood 
signal h_neighborhood_updated 
decrement receivedM SG Ml 
it receivedM SG.ttl > then 

send receivedMSG to All Neighbors 
end if 
end if 
end if 

if localTopologyChanged then 

MSG <- ci-eateMSGAllNeigtibors(tti <- h) 
send MSG to All Neighbors 
localTopologyChanged -i— FALSE 

end if 
end while 



message, each node checks the message id to determine if it 
has received this message before. If the message is new, the 
node stores the provided information — since it is necessary 
for determining its own h_neighborhood. As only localized 
information is required, buffer complexity at each node n is 
limited to 0{\hjrLeigh{n)\). The node then decrements the 
TTL. If the TTL is not zero, the node relays the message to 
all its neighbors; otherwise, no further action is taken. This 
runs in parallel at each node and after h steps, all nodes know 
their hjneighborhoods and the degree of its components. 

We analyze the message and time complexity of Algo- 
rithm [T] in the following. First consider the extreme case 
of h being sufficiently large to cover all the network, i.e., the 
h_neighborhood for each node encompasses every other node 
in the network. In this case, since every node only forwards 
new messages, an absolute upper bound for the number of 
messages equals the number of nodes times the total volume of 
the network. When applying DANCE to practical cases, how- 
ever, an h significantly lower than the radiu^ of the network 
(e.g., h = 2) is enough to generate localized information able 
to achieve a suitable trade-off between efficiency in assessing 
network centrality and applicability costs, as we analyze later 



in Section III-E For h = 2, each node sends a message to all 
its neighbors and these neighbors in turn forward each received 
message to their neighbors. Therefore, for h — 2, the expected 
message complexity is 0{n x d^^^g), where n is the number 
of nodes in the network and davg is the network average 
degree. As for the time complexity, the information generated 
at each node has to spread for h hops in order to reach all its 
destinations. Therefore, the expected time complexity is 0(1) 
steps once an h is chosen. In our evaluation, we compare 



DANCE costs to the ones of previous work in Section III-B 



Algorithm |2] presents how the localized volume-based cen- 
trality is computed at each node. No message exchange is 
involved as the required information is already available by 



'The radius r of a graph G{V, E) is equivalent to the minimum eccentricity 
of any node, i.e., r = miui^v i^^^j ev d-i^yj)) where d{i,j) is the shortest 
path distance between nodes i and j. 
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Algorithm 2 calculateCentrality() 



Timer ■(- DISABLE 
while TRUE do 

waitFor h_neighborhood_updated OR Timer. done 

if h_neighborhood_updated then 
restart Timer 

end if 

if Timer. done then 

calculate neighborhood_volume 

if neighborhood_volume has changed then 

signal neighbor hood_volume_C hanged 
end if 
end if 
end while 



the execution of Algorithm [T] and the main computation is the 
summing of degrees of each node within the h_neighborhood. 
Even so, a timer is included to avoid repeating calculations 
while the knowledge of the neighborhood is still unstable. 

B. DANCE properties 

As we previously mentioned, in the trivial case where 
/i = 0, the localized volume-based centrality exactly matches 
the traditional degree centrality. However, as h increases, the 
hjneighborhood with the largest volume in general coin- 
cides with the hjneighborhood with the largest number of 
nodes. Moreover, since the volume considers all the con- 
nections to nodes outside the neighborhood, this means that 
the hjneighborhood with largest volume is associated to the 
{h + l)jaeighborhood with the largest number of nodes. We 
observe that this construction is highly related to the concept of 
the traditional closeness centrality, since closeness centrality 
can be defined in terms of how many nodes can be reached at 
increasing distances from the node in consideration. 

Another property of the largest volume hjneighborhood 
among all nodes is that, for a given h value, it is the 
hjneighborhood that best approximates the whole network. 
Even though we do not present a formal proof here, we 
provide a short intuitive motivation. Consider the adjacency 
matrix A„xn of a given network. The hjneighborhood can 
be represented as an adjacency matrix H„xn with the same 
structure and dimensions of the whole network. Actually, 
hjneighborhood is a subgraph of the network. The proximity 
between these adjacency matrices for the whole network and 
a given hjneighborhood can be measured by taking into 
account the Frobenius norm of the difference between these 
two adjacency matrices. We recall that the Frobenius norm of 
a matrix is defined as the square root of the sum of the squares 
of the elements of a matrix, i.e., in our case; 



i=i j=i 



(2) 



where A = A — H. Therefore, when subtracting for a given h 
the hjneighborhood adjacency matrix from the adjacency 
matrix for the whole network, the result for the largest volume 
hjneighborhood {i.e., the largest subgraph) has the smallest 
Frobenius norm. 



From these properties, we can expect the localized volume- 
based centrality provided by DANCE to correlate well with the 
traditional closeness centrality. This is indeed experimentally 
confirmed in Section |III-C| We can also expect the strength 
of this correlation to depend on the network topology as well 



as on the choice of h. Later in Section III-E we empirically 



show that for synthetic networks and trace-based real-world 
networks of different kinds — ^both in structure and scale — , 
h — 2 provides a suitable choice. 

C. Navigation procedure towards central nodes 

As a second contribution, DANCE offers a navigation pro- 
cedure to locate local and global maxima in centrality starting 
from any node in the network. This procedure uses only the 
localized volume-based centrality at each node computed by 



DANCE, as described in Section II-A Broadly speaking, after 



each node knows the centrality of all other nodes within 
its hjneighborhood, each node can use this information to 
navigate towards higher centrality values. Before proceeding 
with the navigation procedure, we introduce two definitions: 

• A semi-critical node is a node with the highest centrality 
value within its hjneighborhood. This means a semi- 
critical node has the highest centrality value in a radius h 
around itself, and consequently it plays a special role in 
the location process, as discussed in the following; 

• A critical node is a node that has the highest centrality 
value in a radius of at least 2 x /i, and is regarded as a local 
maximum in centrality. This is equivalent to be the node 
with the highest centrality value in the hjneighborhood 
of all nodes that belong to its hjneighborhood. 

From these definitions, we observe that a critical node is also 
a semi-critical one. Figure |2(a)| shows an example network. 
When a node x is semi-critical but not critical, this means 
that — even though this node x has the highest centrality value 
in its own hjneighborhood (Figure 2(b) I — , there is at least 



one node y within the hjneighborhood of node x that knows 
another node z (outside the hjneighborhood of node x) with 



an even higher centrality value than node x (Figure 2(c) i. 

We now describe the process that makes each node in 
the network aware of the critical node related to it, and 
also each critical node in the network aware of all other 
existent critical nodes. The first step in this process is to 
make each node aware of the centrality values of all nodes 
belonging to its hjneighborhood. This is done using the same 
mechanism described in Section |II-A| and Algorithm [T] Once 
each node is aware of the centrality value of all nodes is its 
hjneighborhood, each node selects and informs the node with 
the highest centrality value of its hjneighborhood. At this 
point, a tie might happen. In this case, the node ids are used 
as a tie breaker (say the largest id prevails). As a consequence, 
there is an increasing order of centrality values towards a 
single critical node with the highest centrality. Hence, we 
conclude that every network has at least one critical node and 
also has a global centrality maximum. 

Now that the existence of at least one critical node in every 
network is guaranteed, we show in the following that, using the 
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same localized volume-based centrality values, it is possible to 
create a path connecting an arbitrary node to a single critical 
node. First, given an arbitrary node, it may be a critical node 
or not. If it is a critical node, we are done. If it is not a 
critical node, the node may still be a semi-critical node or not. 
If the node is semi-critical (but not critical), this means that 
there is at least one node in its h_neighborhood that knows a 
node with a higher centrality value than itself. Since the semi- 
critical node is aware of all nodes that selected it as the node 
with the highest centrality value in their hjneighborhoods, 
the semi-critical node is also aware of all nodes that did not. 
Therefore, the semi-critical node selects as the next hop on 
the path the node with the highest centrality value that did 
not selected it and the process restarts from there. If the node 
is neither critical nor semi-critical, it choses the node with 
highest centrality value on its h_neighborhood as the next 
hop and the process goes on from there. Algorithm |3] presents 
the described navigation procedure of DANCE. 

It follows from this discussion that each node can be 
associated to a single critical node, therefore partitioning the 
network. This partitioning is used to provide a way to make 
all nodes in the network aware of the identity of its associated 
critical node and also to make every critical node aware of the 
identity of all other existent critical nodes in the network. In 
order to achieve this, first each node has to know the identity of 
its associated critical node. When a node becomes critical (i.e., 
when it receives the selection information from all nodes on its 
hjneighborhood), the new critical node informs its status of 
critical node to all nodes that selected it as the node with the 
highest centrality value they are aware of. In turn, each node 
that receives such an information relays the identity of the new 
critical node to all nodes that selected it as the node with the 
highest centrality value. Thus, the identity of the critical node 
flows back to the lower centrality valued nodes in the inverse 
path constructed by Algorithm[3] The only exception to this are 
the semi-critical nodes and the nodes on their selection chain. 
This happens because a semi-critical node selects itself as the 
highest centrality node in its hjneighborhood and therefore 
is not eligible to receive the notification of the new critical 
node identity. To fix this, when a node becomes aware of its 
status of semi-critical node, it uses Algorithm [3] to locate its 
associated critical node and spreads its identity down to its 
selection chain. This behavior is valid for every semi-critical 



Algorithm 3 LOCATECRlTlCALNODE(curreni7Vode) 

Input: currentN ode 
Output: criticalNode 

1: waitFor node selected by currentN ode Is DEFINED 
2: highC N ode node selected by currentNode 
3: if currentNode is NOT semi-critical then 
4: return NAViGATEToCRiTiCALNODE(/iighCAfo(ie) 
5: else 

6: if currentNode is critical node then 
7: return currentNode 

8: else 

9: nextNode ■(— GETNODEKNOWSHlGHERC(curreni Afode) 

10: return NAViGATEToCRiTiCALNODE(ne2:tArode) 

11: end if 
12: end if 



node. At the end of this process, every node in the network is 
aware of the identity of the critical node associated to itself. 

At this point, each node knows the identity of the single 
critical node associated to it and relays this information to 
all of its direct neighbors. This way each node finds out if 
any of its neighbors is associated to a different critical node 
than itself. When this happens, the node relays the identity of 
this new critical node to its associated one. When a critical 
node receives such an information, the reported id can be of 
a previously unknown critical node or not. In the case it is 
an unknown one, the receiver relays the message to all its 
previously known critical nodes. By the end of this process, 
all nodes know their associated critical node and every critical 
node knows the identity of all other critical nodes in the 
network. This means that any node can know the identity of 
the highest centrality node in the network and also all local 
maxima, according to the critical node definition. 

The message complexity analysis of the DANCE navi- 
gation procedure is divided in four phases. The first phase 
concerns the step where critical and semi-critical nodes get 
aware of their status. For this to happen, the centrality value 
of each node has to be made known for all other nodes 
in the hjneighborhood of this node. This is equivalent to 
Algorit hm [T] a nd so is the analysis of message complexity (see 
Section II-Ai. In the second phase, each node gets aware of 



the identity of the critical node associated to it. This happens 
as each node in the network receives a single message either 
directly from its associated critical node or from a semi-critical 
node associated to its critical node. Each of these messages 
faces a number of hops no larger than the network diameter 
Therefore, the message complexity for this second phase is 
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0{n X D), where n is the number of nodes and D is the 
network diameter In the third phase, each node relays the 
identity of its associated critical node to its neighbors. Since 
each node sends a message to each one of its neighbors, the 
number of messages generated in this phase equals the network 
volume. In the fourth and last phase, the nodes that get aware 
of neighbors associated to different critical nodes relay this 
information to their critical node. Since not all nodes need to 
do this, we note that the number of messages needed is smaller 
than n X D. From this, we observe that the overall message 
complexity is dominated by the first phase, and thus it has 
the same message complexity as the centrality determination 



algorithm described in Section II-A 



III. Performance Evaluation 

In this section, we evaluate the performance of DANCE 
in assessing the network centrality in a distributed way. In 
our evaluation, we first analyze the effectiveness of DANCE 
along with the complexity in applying our method. Then 
we present experimental results on the correlation of the 
distributed centrality assessment provided by DANCE and the 
closeness centrality, the impact of the size of neighborhood, 
and the applicability of DANCE in large-scale networks. When 
we compare DANCE with another distributed method for 
assessing network centrality found in the literature, we only 
consider the part of DANCE concerning the computation of 
the centrality values, because the other previous works in 
distributed assessment of network centrality do not offer any 
means of locating high centrality nodes. In this study, we 
apply DANCE to different network scenarios, showing the 
experimental applicability of DANCE and also complementing 
the complexity analysis performed in Section |ll] 

A. DANCE effectiveness 

We evaluate DANCE effectiveness by comparing it to a 
recent distributed algorithm for assessing network centrality 
named SOC (Second Order Centrahty) |jT3). SOC is based on 
a perpetual random walk and determines each node's centrality 
by evaluating the standard deviation of the time interval 
between visits of the random walk to each node. SOC is very 
effective in determining a node's centrality in a distributed 
way, but it is difficult to determine the stopping criteria 
for it. This happens because in order to yield a reasonable 
centrality assessment, each node in the network has to be 
visited by the random walk a sufficient number of times to 
make it possible to accurately compute the standard deviation 
of the time intervals between visits. In principle, since the 
network is not fully known, this poses a major problem for 
the application of SOC. In this comparison, we consider SOC 
configured to use 2 x 10® messages before stopping, as done 



by Kermarrec et al. 1 13 1 for networks of the same dimensions 
and structures as we consider here. DANCE uses h — 2. In 
Section |III-E| we show that h = 2 is a suitable choice. 

We use two kind of synthetic networks: Scale-free networks 
based on the Barabasi-Albert (BA) model | |T4) and random 
networks based on the Erdos-Renyi (ER) model p3). The BA 



networks have 1000 nodes and are created with 5 connections 
per new node, resulting in a 9.95 mean node degree. The ER 
networks also have 1000 nodes and a connection probability 
p — 0.01, which corresponds to 1.5 x ^^j^^, ensuring that the 
resulting networks are connected as it is known that p > 
is a sharp threshold for the connectedness of ER networks. 
The mean degree of the ER networks can vary, but in this 
case remains close to 10. 

For the evaluation process, we take 100 BA and 100 
ER networks and subject them to a process of successively 
removing the node with the highest centrality until the re- 
maining connected giant component has less than 20% of 
the total initial nodes {i.e., the network becomes fragmented). 
By applying such a procedure with both SOC and DANCE 
methods, we observe which one needs fewer steps to fragment 
the network, indicating that it locates the nodes that are more 
important to the network connectivity and therefore indicating 
the effectiveness of each method in assessing the network 
centrality. In addition to effectiveness, we also evaluate ap- 



plicability costs for SOC and DANCE in Section III-B 



Figure [3] shows the behavior of a BA and a ER network 
under node removal. The nodes to be removed are determined 
by SOC, DANCE, and a random choice. Results indicate that 
the behavior observed under SOC and DANCE removal is 
quite similar. In fact, to reduce the giant connected component 
to 20% of the initial network size in the BA network, SOC 
needs to remove 32.8% of the initial nodes in the network, 
while DANCE needs to remove 35.8% of the initial nodes on 
the network. To fragment the ER network, SOC requires the 
removal of 50.6% of the initial nodes, while DANCE needs 
the removal of 52.8% of the initial nodes. In contrast to using 
SOC or DANCE to evaluate the centralities of nodes to be 
removed, a randomly driven node removal policy needs to 
remove more than 70% of nodes to fragment the network in 
both BA and ER cases. 

The same node removal procedure is applied on 100 BA and 
100 ER networks with 1000 nodes each. The obtained results 
in each case are quite similar to those shown in Figu re [3] 
indicating that the two particular networks used in Figures 3(a) 



and 3(b) are representative of general cases for BA and ER 
networks, respectively. Figure |4] shows a CDF obtained from 
the experiments on all BA and ER networks that shows the 
difference in the required fraction of node removals between 
SOC and DANCE to reach the result of reducing the main 
giant connected component to 20% of the initial number 
of nodes. The results show that for BA networks DANCE 
needs less than 2.4% and 2.8% more removals than SOC in 
90% and 99% of the cases, respectively. For ER networks, 
DANCE requires less than 2.8% and 3.3% more removals than 
SOC in 90% and 99% of the cases, respectively. Therefore, 
DANCE consistently reaches a quite similar effectiveness 
with respect to SOC in assessing network centrality in a 



distributed way. Nevertheless, in Section III-B we show that 
this similar effectiveness is reached with applicability costs 
that are significantly lower in DANCE than in SOC. 
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(a) BA Network (b) ER Network 

Fig. 3. Impact of node removal on the network giant component. 
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Fig. 4. Difference between SOC and DANCE {h = 2) in required removed Fig. 5. Messages used in SOC and DANCE (h = 2). 

nodes to network fragmentation. 



B. DANCE applicability costs 

We analyze the applicability costs of DANCE and SOC to 



obtain the results shown in Section 111-A We consider message 
complexity, convergence time, and computational costs: 
• Message complexity - Figure |5] presents the number 
of messages needed for assessing the centralities with 
SOC and DANCE for the same two networks used in 
Figure [3] The number of messages used in SOC is fixed 
to 2 X 10^ messages, as suggested by its authors for 1000 
nodes networks p3) . WTien using DANCE, the number 
of messages needed to assess the centrality of all the 
nodes in the network depends on the network topology. 
Nevertheless, as can be observed in Figure [5] the number 
of messages used in DANCE is significantly lower than 
the number of messages used by SOC. We also measured 
the number of messages used for all the 100 BA and 
100 ER networks used for constructing Figure]?] In 90% 
of the cases, DANCE uses no more than 221,322 and 
123,394 messages for BA and ER networks, respectively. 
The maximum number of messages used by DANCE for 
BA networks is 235,316 messages and for ER networks 
is 128,440 messages. We remark that is in contrast to 
SOC using 2 x 10^ messages for the same networks. 
From this study, we conclude that the message cost 
for distributively assessing the network centrality using 
DANCE is typically one order of magnitude lower than 



the cost for obtaining a similar result using SOC. 

• Convergence time - Message exchanging in DANCE to 
assess the localized centrality is highly paralleled. The 
messages generated at each node must propagate through 
predetermined h hops, thus limiting the time convergence 
of DANCE to 0(1) steps regardless of the network size. 
As SOC is based on a single random walk, all messages 
are sequential in time, rendering the convergence time 
significantly larger and dependent on the number of nodes 
in the network. According to Kermarrec et al. f\3\ , the 
convergence time in SOC is 0(71"^) steps, where n is the 
number of nodes in the networks. 

• Computational costs - The localized centrality adopted 
in DANCE requires the determination of the volume of 
the h_neighborhood around each node. SOC requires 
the computation of the standard deviation of the time 
intervals between visits of the random walk. In both 
cases, computational costs are modest and fully dis- 
tributed among the nodes in the network. 

C. Correlation between DANCE and closeness centrality 



At the end of Section II-A we argue that we expect a 



high level of correlation between the node ranking provided 
by closeness centrality and the node ranking provided by 
DANCE. In this section, we experimentally confirm this claim 
by showing the correlation between these rankings obtained 
from different synthetically generated networks as well as 
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Node ranking provided by DANCE (h=2) Node ranking provided by DANCE {h=2) 

(a) BA Network (b) ER Network 

Fig. 6. Correlation between the node rankings provided by DANCE and closeness centrality. 



traces of real-world networks. 

Figure [6] shows the correlation between the rankings pro- 
vided by DANCE (h = 2) and closeness centrality for 
each node in one BA and one ER network, both randomly 
chosen among the same BA and ER networks described in 
Section |III-A| We observe a high correlation in both cases: 
for the BA network the correlation coefficient between the 
rankings based on closeness centrality and DANCE {h ~ 2) 
is i? = 0.9979 while for the ER network is i? = 0.9970. 
Considering the whole set of 100 BA and 100 ER networks, 
all results for the correlation coefficient are between Rmin = 
0.9972 and Rmax — 0.9986 for the BA networks and between 
Rrmn = 0.9962 and R^ax = 0.9975 for the ER networks. 

We next perform experiments to analyze the correlation 
in the node rankings provided by closeness centrality and 
DANCE using the network traces specified in Table |l] Routers- 
CAIDA refers to the giant connected component extracted 
from a real-world trace representing a router-level network 
topology collected by CAIDA |16|. Route Views represents a 
symmetrized snapshot of the Internet structure at the AS-level 
reconstructed from BGP tables [17J . PGP-net refers to a net- 
work of users of the Pretty-Good-Privacy algorithm for secure 
information exchange |18|. Limiting message complexity by 
setting h = 2, the correlation coefficient in the node rankings 
provided by closeness centrality and DANCE is i? = 0.9066, 
R = 0.9954, and R ^ 0.8704 for Routers-CAIDA, Route- 



we 



Views, and PGP-net, respectively. Later in Section III-E 
perform a cost-efectiveness analysis that indicates h = 2 as 
a suitable choice to balance the trade-off between message 
cost and the resulting correlation between the node rankings 
provided by closeness centrality and DANCE. 

TABLE I 
Real-world network traces. 



Network trace 


number of nodes 


number of edges 


radius 


Routers-CAIDA |16i 


190,914 


607,610 


13 


Route Views |17| 


22,693 


48,436 


6 


PGP-net 118] 


10,680 


24,316 


12 



The high correlation between the node ranking provided 
by DANCE and the node ranking provided by closeness 



centrality constitutes a key outcome. Closeness centrality is a 
basic metric to analyze complex networks. To the best of our 
knowledge, however, there is no distributed method to compute 
the closeness centrality. Even if the full network topology is 
known, closeness centrality is too costly — 0{n ■ m) where n 
is the number of nodes and m is the number of edges — to be 
applied in very large complex networks. DANCE thus provides 
a simple, efficient, and practical alternative to rank nodes in 
very large complex networks in close relation with the node 
ranking by closeness centrality. 

D. Optimizing the correlation with closeness centrality 

In this section, we analyze the impact of the choice of h 
on the correlation coefficient between the node rankings 
based on closeness centrality and DANCE. It is intuitive to 
expect this correlation to improve as h increases; and indeed 
that is the case up to a certain point. At this point, many 
h_neighborhoods centered at nodes that do not have similar 
closeness centrality start to present similar volume, therefore 
degrading the correlation. From this point on, an increase 
on h has a negative effect on the correlation. To illustrate 
this behavior. Figure |7] shows the correlation coefficient of 
DANCE node ranking with the closeness centrality ranking 
as function of h for an ER network with 1,455 nodes and 
radius 15. This is a sparse network with a large radius so 
that the effect described above can be better observed. The 
correlation gradually increases from h = 1 to h = 9 and 
then starts decreasing as h further increases (see close-up view 
around h — 9 in Figure |7]i. 

This analysis illustrates the considered DANCE properties 
discussed in Section III-BI These results also show that is 
possible to optimize the correlation between the node rankings 
based on closeness centrality and DANCE by a proper choice 
of h. As h increases towards the network radius, there is 
a point where hjneighborhoods of high centrality nodes 
increase slower than the h_neighborhoods of nodes with 
lower centrality. As a consequence, this creates a distortion 
that degrades the correlation. The h value that optimizes the 
correlation occurs just before this turning point. The exact h 
value that causes the turning point depends on the network 
topology, being limited by the network radius. Since a high 




Fig. 7. Impact of h on the coiTelation coefficient between tlie node rankings Fig. 8. Trade-off between the message cost and the correlation coefficient 
provided by DANCE and closeness centrality. of the relation DANCE x closeness centrality with increasing h. 



correlation is guaranteed to be achieved with a proper h, it 
is possible to find an h that optimizes the correlation by an 
interactive process. Using this method, the optimized h values 
in DANCE for the Routers-CAIDA, Route Views, and PGP-net 
network traces are h — 4, h = 2, and h ~ 5, respectively. The 
resulting correlation coefficients in the rankings provided by 
closeness centrality and DANCE are very high for all network 
traces: R = 0.9943 for Routers-CAIDA, R = 0.9954 for 
Route Views, and R = 0.9955 for PGP-net. In this case, the 
adopted values for h are chosen to optimize the strength of 
the correlation disregarding the message costs for that, which 



is analyzed in Section III-E 



E. Trade-off between h and message cost 

An increase on h causes an increase in the number of 
messages needed to obtain the volume-based centrality of 
each node. Hence, one has to consider the cost-effectiveness 
relation of increasing h. Thus, we argue that it is possible 
to find a suitable value of h, balancing the trade-off between 
the message cost and the correlation coefficient of the node 
rankings provided by DANCE and by closeness centrality. 

Figure [8] shows this trade-off for the traces of real-world 
networks (Tablejl]). The vertical axis at the left refers to the cor- 
relation coefficient of the node ranking provided by DANCE 
and the one provided by closeness centrality. The vertical axis 
at the right shows the normaUzed number of messages. For 
all three networks the best trade-off between the correlation 
coefficient and the message cost happens with h = 2 — i.e., 
the message cost is still low and the correlation coefficient is 
relatively high. The same is also valid for all the synthetically 
generated networks considered in this paper. This suggests 
h ~ 2 provides a suitable cost-effectiveness balance. 

F. Practical applicability of DANCE 

In this section, we discuss the practical applicability of 
DANCE. The computing of the localized volume-based cen- 
trality of DANCE for the h_neighborhood of each node 
only requires the identity and degree of the nodes belonging 
to each hjneighborhood. To achieve this, DANCE can be 
implemented in different ways, ranging from a centralized 
approach running on a single CPU core to a fully distributed 
approach with the analysis of each node runs on a separate 



CPU. From the viewpoint of the available knowledge about 
the network, this means DANCE can be used in networks 
where the topology is fully known and also in networks 
where each node is only aware of the identity of its direct 
neighbors. DANCE can provide useful centrality assessment 
(strongly correlated to closeness centrality ranking) using 
highly localized knowledge (i.e., small values for h), meaning 
either fast running time when implemented in a centralized 
way or small message cost combined with fast running when 
implemented in a distributed approach. This flexibility allows 
a wide range of applications for DANCE for a diverse set of 
large-scale complex networks. 

As an example of practical applicability of DANCE to 
a large-scale complex network, we apply DANCE to an 
anonymized network of YouTube users |19] with 1,134,890 
users and 2,987,624 edges. Since we have the full dataset 
describing the network, the adopted DANCE implementation 
is fully centralized running in a single CPU core. Running 
DANCE with h ~ 2, we obtain the localized volume-based 
centrality values for all nodes in a few minutes because of 
the relatively low applicability costs of DANCE (see Sec- 
tion |III-B[ ). Using the same computational resources to run 
a traditional implementation of closeness centrality (as the 
one found in the networkx library f20l) would take several 
days, if even feasible due to the high computational costs. 
Similarly, the high convergence time of SOC and the difficulty 
in determining its stopping criterium renders unfeasible its 
practical applicability to large-scale networks such as this one. 

IV. Related Work 

There are many centrality measures used with the purpose 
of assessing the relative importance of different nodes on 
a network under different criteria, such as its capacity for 
information diffusion or its relevance for connectivity. The 
best known examples are the traditional degree, betweenness, 
closeness, and eigenvector centralities l^-fTl. 

The computing of most of the traditional centralities is in 
general computationally expensive and requires full knowledge 
of the network topology. Therefore, some recent efforts are 
dedicated to optimize the way by which traditional centralities 
are calculated or approximated p)-pO). Dinh et al. |21 1 pro- 



pose a new model to assess network vulnerabilities formulating 
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it as an optimization problem that can render approximate so- 
lutions with provable performance bounding. These methods, 
however, still require fall knowledge of the network topology 
to compute a centrality approximation, hindering their appli- 
cability to large-scale networks where such an information is 
unavailable and a distributed implementation is required. 

Alternatively, as our proposal, some previous work inves- 
tigate methods to assess network centrality in a distributed 
way without requiring full knowledge of the network topol- 
ogy 1 1 1 1-| 13 1. Lehmann and Kaufmann pT) propose a frame- 
work for computing shortest-path based centralities, such as 
closeness and betweenness, in a decentralized way, but their 
proposal is still computationally expensive for application to 
large-scale complex networks. Nanda and Kotz iTTl propose 
a new centrality metric called Localized Bridging Centrality 
(LBC). LBC provides a specialized centrality — targeted at 
locating bridges, i.e., edges whose removal disconnects the 
network — using only one-hop neighborhoods around each 
node. The proposed use of LBC is on relatively small- 
scale wireless mesh networks. One of the main motivations 
behind the LBC proposal is a paper by Marsden |22|, which 
shows empirical evidence that localized centrality measures 
computed for one-hop neighborhood are highly correlated to 
a global centrality measure. In this paper, we extend this notion 
by proposing DANCE and showing that its localized volume- 
based centrality correlates well with closeness centrality. 

Kermarrec et al. |13| propose a new centrality measure, 
called Second Order Centrality (SOC). This method has the 
same goal as ours in assessing network centrality in a dis- 
tributed way without requiring full knowledge of the network 
topology. Nevertheless, relying on perpetual random walks 
has a potentially long and undetermined convergence time as 
well as a high message complexity. This is in contrast with 
DANCE, which offers a similar result to SOC in terms of 
effectiveness in assessing network centrality in a distributed 
way associated with faster and deterministic convergence time 
along with a much lower message complexity. 

V. Conclusion 

In this paper, we propose DANCE, a novel distributed 
method to assess network centrality in complex networks 
without requiring full knowledge of the network topology. 
DANCE computes a localized volume-based centrality at 
each node considering only a limited neighborhood around 
every node. DANCE also provides a navigation procedure 
allowing the location of the most central nodes. Compared 
with previous work, DANCE achieves similar effectiveness, 
but with applicability costs that are significantly less costly. 

Another key outcome shows that node rankings provided by 
DANCE and closeness centrality correlate very well. We show 
h — 2 presents a suitable trade-off between limited message 
costs and this high correlation. This depends on the network 
radius not being large compared to h. Most complex networks 
of interest present small world property (i.e., small radius 
compared to network size), thus rendering DANCE applicable 
to the practical analysis of these networks. Overall, DANCE 



contributes with a simple and efficient alternative to assess 
network centrality in large-scale complex networks. 

Most complex networks also present dynamic behavior As 
future work, we plan to investigate how DANCE can con- 
tribute to the analysis and modeling of the dynamic behavior 
of complex networks. 
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